Compiling and Using the IJS-ELAN Parallel Corpus
نویسنده
چکیده
منابع مشابه
Normalising the IJS-ELAN Slovene-English Parallel Corpus for the Extraction of Multilingual Terminology
Various efforts have been made for the development of tools and methods dedicated to the automatic processing of multilingual terminology databases. For that purpose, multilingual parallel corpora have been used as a basis resource. However, most of the neologisms in technical and scientific domains are realised by multiword terms that are rarely identified in parallel corpora. In this paper, w...
متن کاملSlovene-English Datasets for MT
Advances in machine translation are becoming increasingly dependent on the availability of large scale language resources, in particular parallel corpora. The talk presents Slovene-English language resources that were developed as datasets for translation studies and machine learning programs. Three parallel datasets are introduced: the MULTEXT-East multilingual word-annotated corpus, the IJS-E...
متن کاملCompilation and Exploitation of the IJS-ELAN Parallel Corpus
With more and more text being available in electronic form, it is becoming relatively easy to obtain digital texts together with their translations. The paper presents the processing steps necessary to compile such texts into parallel corpora, an extremely useful language resource. Parallel corpora can be used as a translation aid for second-language learners, for translators and lexicographers...
متن کاملStatistical machine translation from Slovenian to English
In this paper, we analyse three statistical models for the machine translation of Slovenian into English. All of them are based on the IBM Model 4, but differ in the type of linguistic knowledge they use. Model 4a uses only basic linguistic units of the text, i.e., words and sentences. In Model 4b, lemmatisation is used as a preprocessing step of the translation task. Lemmatisation also makes i...
متن کاملSlovenian to English Machine Translation using Corpora of Different Sizes and Morpho-syntactic Information
Word based statistical machine translation has emerged as a robust method for building machine translation systems. Inflective languages point out some problems with the approach. Data sparsity is one of them. It can be partly solved by enlarging the training corpus and/or including richer linguistic information: lemmas and morpho-syntactic features. Acquisition of a large bilingual parallel co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Informatica (Slovenia)
دوره 26 شماره
صفحات -
تاریخ انتشار 2002